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AN  APPLICATION  OF  SYNTACTIC  PATTERN  RECOGNITION 
TO  SEISMIC  DISCRIMINATION* 

H8I-H0  LIU  and  KINO-SUN  FU 

School  of  Electrical  Engineering 
Purdue  University 
Meet  Lafayette,  Indiana  47907 

ABSTRACT 

•  \ 

Two  syntactic  methods  for  the  recognition  of  seismic 
waveforms  are  presented  in  this  paper.  The  seismic  waveforms  are 
represented  by  sentences  (strings  of  primitives).  Primitive 
extraction  is  based  on  a  cluster  analysis.  Finite-state  grammars 
are  inferred  from  the  training  samples.  The  nearest-neighbor 
decision  rule  and  error-correcting  finite-state  parsers  are  used 
for  pattern  c lassif ication.  While  both  show  equal  recognition 
performance,  the  nearest-neighbor  rule  is  much  faster  in  computa¬ 
tion  speed.  The  c lassif ication  of  real  earthquake  /  explosion 
data  is  presented  as  an  application  example. 

1.  INTRODUCTION 

Seismolog ical  methods  are  the  most  effective  and  practical 
methods  for  detecting  nuclear  explosions,  especially  for  under¬ 
ground  explosions.  Position,  depth  and  origin  time  of  the  seismic 
events  are  useful  information  for  d iscr imination;  so  are  body 
wave  magnitude  and  surface  wave  magnitude  of  the  seismic  wave 
Cl, 23.  Unfortunately,  they  are  not  always  applicable  and  reliable 
for  small  events.  It  would  be  very  helpful  if  the  di scr imination 

*  This  work  was  supported  by  the  N8F  Grant  PFR  79-06296  and  the 
ONR  Contract  N00014-79-C-0574.  y 
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is  based  on  the  short-period  waves  alone.  The  application  of  pat¬ 
tern  recognition  techniques  to  seismic  wave  analysis  has  been 
studied  extensively  C3-53  in  the  last  few  years.  They  all  use 
short-period  waves  only.  Most  of  these  studies  concentrated  on 
feature  selection.  Only  simple  decision-theoretic  approaches  have 
been  used.  However*  syntactic  pattern  recognition  appears  to  be 
quite  promising  in  this  area.  It  uses  the  structural  information 
of  the  seismic  wave  which  is  very  important  in  analysis.  In  this 
paper*  we  present  two  different  methods  of  syntactic  approach  to 
the  recognition  of  seismic  waves.  One  uses  the  nearest-nei ghbor 
decision  rule*  the  other  uses  the  error-correcting  parsing. 

In  the  first  method*  a  pattern  representation  sybsystem  con¬ 
verts  the  seismic  waveforms  into  strings  of  primitives.  The 
str ing-to-str ing  distances  between  the  test  sample  and  all  the 
training  samples  are  computed  and  then  the  nearest-neighbor  deci¬ 
sion  rule  is  applied.  The  block  diagram  is  shown  in  Figure  1(a). 
The  second  method  contains  pattern  representation,  automatic 
grammatical  inference  and  error-correcting  parsing.  The  block 
diagram  is  shown  in  Figure  1(b). 

The  pattern  representation  subsystem  performs  pattern  seg¬ 
mentation,  feature  selection  and  primitive  recognition  so  as  to 
convert  the  seismic  wave  into  a  string  of  primitives  The 
automatic  grammatical  inference  subsystem  infers  a  finite-state 
(regular)  grammar  from  a  finite  set  of  training  samples  The 
error-correcting  parser  can  accept  erroneous  and  noisy  patterns. 
Human  interaction  is  required  only  in  the  training  stage*  mostly 


Figure  1(e)  Block  diagram  of  th*  nearest-neighbor 
decision  rule  for  string  patterns. 


Figure  1(b)  Block  diagram  of  the  error-correcting 
parsing  system 

in  pattern  representation  and  slightly  in  grammatical  inference. 
We  use  our  syntactic  patten  recognition  methods  to  classify 
nuclear  explosions  and  earthquakes  based  on  the  seismic  P-waves. 
A  typical  sample  from  each  class  is  shown  in  Figure  2(a). 
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However/  extreme  cases  do  exist.  A  near  explosion  looks  like  typ¬ 
ical  earthquakes  while  a  deep  earthquake  looks  like  typical 
explosions.  They  are  shown  in  Figure  2(b). 

IX  PATTERN  REPRESENTATION 

Seismic  records  are  one-dimensional  waveforms.  Although 
there  exist  several  alternatives  16,71  for  representing  one¬ 
dimensional  waveforms/  it  is  most  natural  to  represent  them  by 
sentences/  i.  e.  ,  strings  of  primitives.  Three  steps  are  required 
for  the  conversion  —  pattern  segmentation/  feature  selection  and 
primitive  recognition. 

&.  PjUini  SumtniiUan 

A  digitized  waveform  to  be  processed  by  a  digital  computer 
is  usually  sampled  from  a  continuous  waveform  which  represents 
the  phenomena  of  a  source  plus  external  noise.  For  some  cases/ 
such  as  EKC  wave  183/  every  single  peak  and  valley  are  signifi¬ 
cant.  Therefore  these  waveforms  can  be  segmented  according  to  the 
shape.  For  others/  like  EEC  C93  and  seismic  wave#  a  single  peak 
or  velley  does  not  reveal  too  much  information/  especially  when 
the  signal  to  noise  ratio  is  low.  Therefore/  they  should  be  seg¬ 
mented  by  length/  either  a  fixed  length  or  variable  length  A 
var iab le-length  segmentation  is  more  efficient  and  precise  in 
representation/  but  it  is  usually  very  difficult  and  time  consum¬ 
ing  to  find  an  appropriate  segmentation.  A  fixed-length  segmenta¬ 
tion  is  much  easier  to  implement.  If  the  length  is  kept  short 
enough  it  will  be  adequate  to  represent  the  original  waveform. 


The  selection  of  segment  length  is  ceil  dependent.  It  can 


anywhere  between  the  two  extreme*,  i. e.  .  a s  long  *s  the  whole 
waveform  or  as  short  es  one  point.  There  ere  tradeoffs  between 
the  representation  accuracy  and  analysis  efficency.  The  shorter 
the  segmentation  is.  the  more  accurate  the  representation  will 
be.  But  the  analysis  becomes  more  inefficient  since  the  string  is 
longer  and  the  parsing  time  is  proportional  to  the  string  length. 
Another  problem  is  the  noise.  If  the  segmentation  is  too  short, 
it  will  be  very  sensitive  to  noise.  A  rule  of  thumb  is  that  each 
segment  should  contain  several  periods  of  the  waveforms  In  our 
seismic  data  base  each  seismic  record  contains  1200  sample 
points.  The  sampling  frequency  is  10  points  per  second.  Each 
record  is  divided  into  20  segments  with  60  points  in  each  seg¬ 
ment. 

&  EttWtt  Silfction 

This  is  the  most  difficult  and  critical  part  in  pattern 
recognition.  Any  linear  functions  or  nonlinear  functions  of  the 
original  measurements  may  be  considered  as  features  provided  they 
give  discriminating  power.  Both  time  domain  features  and  fre¬ 
quency  domain  features  have  been  used  for  seismic  discrimination. 
For  example,  complexity  and  autoregressive  models  are  features  in 
time  domain;  spectral  ratio  and  third  moment  of  frequency  are 
features  in  frequency  domain  C23.  Since  we  segment  the  seismic 
wave,  complexity  and  spectral  ratio  features  are  implicitely  con¬ 
tained  in  the  string  structure.  Furthermore,  the  segment  may  be 
too  short  for  a  model  estimation  if  we  use  shorter  segment. 
Therefore.  we  selected  a  pair  of  commonly  used  features  —  zero 


crossing  count  and  log  energy  of  eech  segment,  which  are  easy  to 
compute  and  contain  significant  information.  Other  features  may 


also  serve  as  good  candidates.  An  advantage  of  syntactic 
approach  is  that  feature  selection  is  simpler/  since  features  are 
extracted  from  segments  and  each  segment  is  much  simpler  in  com¬ 
parison  with  the  whole  waveform. 

£L  Recognition 

After  segmentation  and  feature  selection,  primitives  can  be 
recognized  from  the  analysis  of  training  segments#  and  an  iden¬ 
tifier  assigned  to  each  segment.  This  problem  can  be  solved  in 
two  ways  —  either  classified  by  human  experts  or  by  a  computer. 
We  choose  the  latter#  since  human  classifications  are  not  always 
available  and  reliable.  In  addition  we  need  to  try  different  seg¬ 
ment  lengths  in  order  to  find  an  optimal  segmentation.  Therefore# 
we  use  automatic  clustering  analysis  to  classify  each  segment  In 
the  clustering  process#  similar  samples  will  be  grouped  together. 
The  similarity  between  a  pair  of  samples  is  usually  defined  by 
the  distance  between  them.  Each  segment  is  represented  by  a  vec¬ 
tor  X  *  (Xj.  ....  xK)  where  x,  ,  1  £  i  <  k.  is  the  i-th 

feature,  k  is  the  total  number  of  features.  In  our  case,  k  =  2. 

If  the  number  of  clusters  is  known#  then  the  K-means  algo¬ 
rithm  can  be  applied  to  find  a  clustering  which  minimizes  a  per¬ 
formance  index.  When  the  number  of  clusters  is  unknown  there  is 
no  universally  applicable  algorithm  to  determine  the  optimal 
cluster  number.  We  use  a  bottom-up  hierarchical  clustering  algo- 
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rithn  C103  to  find  the  clustering  of  a  sequence  of  cluster 
numbers.  The  sterting  cluster  number  cen  be  arbitrarily  selected. 
It  may  equal  to  the  number  of  the  training  segments,  but  it  is 
too  time  consuming  even  for  a  moderately  large  training  set. 
Therefore  we  start  from  a  smaller  number,  say  20.  to  find  the 
clustering  using  K-means  algorithm.  The  nearest  pair  of  clusters 
will  be  merged  and  the  cluster  number  is  decreased  by  one  The 
K-means  algorithm  is  applied  again  for  r eorgani tation.  This 
clustering-merging  cycle  repeats  until  the  cluster  number  reaches 
a  preset  lower  bound,  say  3.  then  the  procedure  stops. 


rtlflari tilffl  i  Bottom-Ua  Hierarachical  Slmtfrjng 

Input:  A  set  of  n  unclassified  samples,  an  upper  bound  U 
and  a  lower  bound  L. 

Ouput:  A  sequence  of  optimal  clusterings  for  the  number  of 
clusters  between  U  and  L. 

Method: 


1)  Let  c  *  U.  c  is  the  number  of  clusters,  and  arbitrarily 
assign  cluster  menbership. 

2)  Reassign  membership  using  K-means  algorithm.  If 
c  $  L  •  stop. 


3)  Find  the  nearest  pair  of  clusters,  say 
i  is  j. 


X  ^  and 


4)  Merge  X^  and  Xj .  delete  Xj  and  decrease  c  by  one. 
go  to  step  2. 


The  distance  between  two  clusters  is  defined  by 
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i' 


Xi  * 


l  in:  -  m:  I 


where  m • 


is  the  mean  vectors  of  clusters  1, 


j  respectively. 


The  mein  problem  that  still  remains  unsolved  at  this  point 
is  to  determine  the  optimal  cluster  number.  Some  criteria  have 
been  suggested  for  determining  the  optimal  cluster  number.  How- 
ever,  they  are  not  always  applicable.  We  determine  the  cluster 
number  by  inspecting  the  increment  of  merge  distance.  When  a 
merge  of  two  clusters  is  natural*  the  increment  of  merge  distance 
should  be  small;  otherwise  it  will  be  large.  This  can  only  be 
determined  from  a  sequence  of  cluster  numbers.  The  merge  dis¬ 
tances  of  our  training  samples  from  18  clusters  down  to  7  clus¬ 
ters  are  shown  in  Table  I.  The  increments  of  merge  distances  are 
considerably  large  after  ten  clusters.  Therefore*  it  is  reason¬ 
able  to  select  ten  to  be  the  optimal  number  of  clusters.  After 
the  cluster  number  had  been  determined.  an  identifier  was 
assigned  to  each  cluster.  A  test  segment  is  assigned  to  some 
cluster  if  the  distance  between  the  test  segment  and  that  cluster 
is  the  smallest.  All  the  seismic  waves  are  thereby  converted  into 
strings  of  primitives,  or  sentences. 


ILL  SMEM  ANALYSIS 


If  the  c lass i f icat ion  is  all  we  need.  then  the  nearest- 
neighbor  decision  rule  is  recommended  because  of  it's  computation 
efficiency.  On  the  other  hand,  if  a  complete  description  of  the 
waveform  structure  is  needed,  we  have  to  use  (error-correcting! 
parsing.  An  error-correcting  parser  instead  of  regular  parser  is 
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TABLE  I 

Merge  distance*  of  bottom-up 
hierarchical  clustering  process 


Cluster 

number 


Merge  distance 


29.  9 


36.  6 


37.  7 


43.  7 


47.  6 
57.  2 


67.  4 


94.  5 


105.  4 


144.  9 


187.  1 


Increment  of 
merge  distance 


required  for  most  practical  pattern  recognition  app 1  i cat i ons. 
Since  noisy  and  errors  in  previous  processings  usually  cause  reg¬ 
ular  parsers  to  fail.  It  is  not  unusual  that  even  a  perfect  pat¬ 
tern  can  not  be  parsed  by  a  regular  parser,  especially  when  the 
gram-.ar  is  inferred  from  a  small  set  of  samples.  In  that  case, 
the  error-correcting  parsing  is  equivalent  to  finding  the  dis¬ 
tance  between  a  sentence  and  a  language.  The  parse  of  the  sen¬ 


tence  may  contain  some  error  productions. 


A  Naantal-Naiflli  ft  er  Dac.i  s_lbh  Eula 
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The  concept  of  nearest-neighbor  decision  rule  in  syntactic 
approach  is  the  same  as  that  in  decision-theoretic  approach.  The 
only  difference  is  in  distance  calculation.  The  distance  between 
two  strings  is  sometimes  called  Levenshtein  distance  Cl  13,  which 
is  the  minimum  number  of  symbol  insertions,  deletions  and  substi¬ 
tutions  required  in  order  to  transform  one  string  into  the  other. 
If  different  weights  are  assigned  to  different  symbols  and/or 
operations,  then  the  distance  becomes  a  weighted  Levenshtein  dis¬ 
tance.  These  distances  can  be  computed  using  dynamic  programming 
method  C121.  Figure  5  shows  the  shortest  path  which  transforms 
the  string  on  the  left  into  the  string  on  the  top. 


a  a  b  a  a  b 


a 

b 

\ 

a 

b 

\ 

b 

Figure  5  The  shortest  path  which  transforms  string  'ababb  ' 
into  string  'aabaab'.  The  distance  between  these 
two  strings  is  2.  Horizontal  movement  means 
insertion;  vertical  movement  means  deletion; 
diagonal  movement  means  substitution.  Each 
insertion,  deletion  and  substitution  have  same 
weight  1. 


ft  Error-Correcting  CiPitf-SSlAl.  Plt.li.na 


Before  parsing  can  take  place  we  must  have  a  grammar.  which 
can  be  either  heur istical ly  constructed  or  inferred  from  a  set  of 


u8S» 
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training  tamp  It*  In  ordtr  to  study  tha  learning  capability  of 
our  syntactic  method,  we  chooia  tha  grammatical  infertnca 
approach. 

PiLrilt  Stru*  ture  Grammar 

A  phrata  structura  grammar  8  i*  a  4-tuple 
0  ■  (V^r  VT>  P.  S).  where 

finite  sat  of  nonterminal  symbol* 

finite  set  of  terminal  symbols>  VN  U  V-j-  *=  V, 

vNnvT-f 

S  :  start  symbol,  S  £ 

P  :  finite  sat  of  productions  or  rewrite  rules  of  tha 
form  d  ->  p.  cl.  p  €  V*.  o(  J*  X  >  V*  is  tha  set  of  all  finite 
length  strings  of  symbols  from  V,  including  X.  tha  null 
string,  V*  ■  V*  -  -C\>. 

Let  O  *  <  P»  S)  be  a  grammar.  If  every  production  in 

P  is  of  tha  form  A  ->  aB,  or  A  ->  a,  A.  B  €  a  €  VT »  then  the 
grammar  0  is  finite-state  or  regular  C133. 

Phrase  structura  grammars  have  bean  used  to  describe  pat¬ 
tern*  in  syntactic  pattern  recognition  C14J.  Each  pattern  is 
represented  by  a  string  of  primitives  which  corresponds  to  a  sen¬ 
tence  in  a  language  (tree  or  graph  in  high  dimensional  grammars). 
All  strings  which  belong  to  the  same  class  are  generated  by  one 
grammar. 


Infirum 
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A  sat  of  ssntoncos  S+  is  a  positive  sample  of  a  language 
L(C>.  if  3*£  L(G>.  A  sat  of  santancas  S”  is  a  nagativa  sample  of 
a  language  L(Q),  if  S"  £  L(O). 

A  positive  sample  S+  of  a  language  L(G>  is  structurally  com¬ 
plete  if  each  production  in  0  is  used  in  the  generation  of  at 
least  one  string  in  S  £153. 

We  assume  that  the  set  S  is  structurally  complete  and  S*  C 
L<0o).  where  Ce  is  the  inferred  grammar.  Theoretically,  if  S+  is 
a  structurally  complete  sample  of  the  language  L(G>  generated  by 
the  finite-state  grammar  Q  then  the  canonical  grammar  Gc  can  be 
inferred  from  S*  A  set  of  derived  grammars  can  be  derived  from 
.  The  derived  grammars  are  obtained  by  partitioning  the  set  of 
nonterminals  of  the  canonical  grammar  into  equivalence  classes. 
Each  nonterminal  of  the  derived  grammar  corresponds  to  one  block 
of  the  partition.  Since  the  number  of  possible  partitions  is  too 
large  it  is  infeasible  to  evaluate  all  the  partitions.  Therefore 
some  algorithms  such  as  k-tail  algorithm  £163  has  been  suggested 
to  reduce  the  number  of  derived  grammars.  These  algorithms  have 
one  d isadvantage.  The  reduced  subset  of  derived  grammars  may  not 
contain  the  source  grammar.  However,  it  will  be  sufficient  if  one 
only  interests  in  an  estimate  of  the  source  grammar.  There  are  at 
least  two  situations  where  a  grammatical  inference  algorithm  can 
be  used.  In  the  first  case  there  exists  a  source  grammar  which 
generates  a  language  and  we  want  to  infer  the  source  grammar  or 
automaton  based  on  the  observed  samples.  In  the  second  case  the 
exact  nature  of  the  source  grammar  is  unknown,  the  only 
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information  wo  havo  i«  soma  santancas  generated  by  tho  source  Wo 
assume  that  tho  sourco  grammar  falls  into  a  patricular  class  and 
infer  a  grammar  which  generates  all  tho  training  samples.  and 
hopefully  will  generate  some  samples  belonging  to  the  same  class. 
If  a  negative  sample  set  is  given  the  inferred  grammar  must  not 
generate  any  sample  in  the  negative  sample  set. 

Orammars  more  complex  than  finite-state  grammars  and  res¬ 
tricted  context-free  grammars  (in  Chomsky  hierarchy)  can  not  be 
inferred  efficiently  without  human  interaction.  Therefore  we 
choose  finite-state  grammars  to  describe  the  seismic  waves. 
Another  reason  is  that  no  obvious  self-embedding  property  appears 
in  seismic  waves.  finite-state  grammars  will  be  sufficient  in 
generating  power. 

The  inference  of  regular  grammars  has  been  studied  exten¬ 
sively.  The  k-tail  algorithm  finds  the  canonical  grammar  first 
and  then  merges  the  states  which  are  k-tail  equivalent.  This 
algorithm  is  adjustable,  the  value  of  k  controls  the  size  of  the 
inferred  grammar.  Another  algorithm  called  tai 1-clustering  algo¬ 
rithm  C 171  also  finds  the  canonical  grammar  first,  but  then 
marges  the  states  which  have  common  tails.  This  algorithm  is  not 
as  flexible  as  the  k-tail  algorithm,  but  will  infer  a  grammar 
which  is  closer  to  the  source  grammar  in  some  cases.  Since  the 
grammar  is  inferred  from  a  small  set  of  training  samples,  we  can 
only  expect  that  the  inferred  grammar  generates  all  the  training 
samples  and  will  generate  other  strings  which  are  similar  to  the 
training  samples. 


The  generating  power  of  the  inferred  grammar  relies  entirely 
on  the  merge  procedure.  If  no  merge  exists  then  the  inferred 
grammar  will  generate  exactly  the  same  training  set>  no  more  no 
less.  Since  all  the  seismic  records  have  the  same  length  in  our 
example*  the  sentences  representing  these  signals  also  have  the 
same  length.  The  merge  of  states  does  not  happen  in  our  experi¬ 
ment  when  using  tail-clustering  algorithm. 

Erm.-C.arr-t.cJfc i n g  Panina 

After  a  grammar  is  available*  either  by  inference  or  con¬ 
struction*  the  next  step  is  to  design  a  recognizer  which  will 
recognize  the  patterns  generated  by  the  grammar.  If  the  grammar  Q 
is  f inite-state*  a  deterministic  finite-state  automaton  can  be 
constructed  to  recognize  the  strings  generated  by  G. 

Noise  problem  and  primitives  recognition  error  usually  occur 
in  pratice.  Conventional  parsing  algorithms  can  not  handle  these 
situations.  A  few  approaches  have  been  proposed.  Error-correcting 
parsing  is  one  of  them  C18J.  The  pattern  grammar  is  first 
transformed  into  a  covering  grammar  that  generates  the  correct 
sentences  as  well  as  all  the  possible  erroneous  sentences.  The 
errors  in  string  patterns  are  substitution  error*  deletion  error 
and  insertion  error.  For  nonstochastic  grammar*  the  minimum- 
distance  criterion  can  be  used  for  error-correcting  parsing. 

Since  all  the  sentences  in  our  example  have  the  same  length* 
only  the  substitution  error  needs  to  be  considered.  For  each  pro¬ 
duction  A  ->  aB  and  A  ->  a  in  the  original  grammar  we  add  A  ->  bB 
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and  A  ->  b  respectively  to  the  covering  grammar,  where  A#  B  €  VN. 
e.  b  €  VT<  b  t  i.  Different  weight  cen  be  assigned  to  each  error 
production/  therefore/  resulting  an  minimum-cost  error-correcting 
parser.  The  assignment  of  weights  is  very  crucial.  We  use  the 
distance  between  clusters  a  and  b  as  the  weight  for  substituting 
a  by  b  and  vise  versa.  Since  a  finite-state  grammar  can  be 
represented  by  a  transition  diagram.  Thus#  a  minimum-cost  error- 
correcting  parsing  is  equivalent  to  finding  a  minimum  cost  path 
from  initial  state  to  final  state.  The  parsing  time  is  propor¬ 
tional  to  the  length  of  the  sentence. 


ftlaar-ilha  2:  Minimum-Cast  gat  hi, 

Input.  A  transition  diagram  with  n  nodes  numbered  1.  2 
.  ....  n.  where  node  1  is  initial  state  and  node  n  is  final  state, 
and  a  cost  function  c^j(a)/  for  1  ^  i.  j  ^  n.  a  6  £  ,  with  c^  (a) 
£0/  for  all  i  and  j.  An  input  string  s. 

Output,  m^  the  lowest  cost  of  any  path  from  node  1  to  node  n 
whose  sequence  is  equal  to  that  of  the  input  string  s. 

Method. 

1)  Set  k  »  1. 

2)  For  all  1  ^  j  g  n,  m^  ■  min  -Cm^  +  c^Cb).  for  all 

1  £  k  $  n>,  where  b  is  the  fc-th  symbol  of  input  string  s. 

3)  If  k  <  is!#  increase  k  by  1  and  go  to  step  (2). 

If  k  *  (si/  go  to  step  (4). 

4)  output  m^/  which  is  the  lowest  cost  from  node  1  to 
node  n  following  the  move  of  input  string  s  Stop. 


The  production  number  can  be  stored  with  c^:  (a)/  and  the 

Q 
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parse  can  be  stored  with  m^  . 

If  insertion  end  deletion  errors  ere  to  be  considered.  then 
the  parser  will  still  be  similar  except  that  we  have  to  compute 
and  store  the  information  V(T»  8.  a)  which  is  the  minimum  cost  of 
changing  character  'a'  into  some  string  which  can  change  the 
state  of  the  automaton  from  state  T  to  S  1193.  The  inclusion  of 
insertion  and  deletion  errors  makes  the  error  correction  more 
complete,  but  assigning  appropriate  weights  to  insertion  and 
deletion  error  could  be  even  more  difficult. 

EXPERIMENTAL  RESULTS 

Our  seismic  data  are  provided  by  professor  C.  H.  Chen.  They 
were  recorded  at  LA8A  in  Montana.  The  original  data  contains  323 
records.  Due  to  some  technical  problems  in  data  conversion  we 
only  get  321  records.  Among  them  111  records  are  nuclear  explo¬ 
sions  and  210  records  arc  earthquakes.  The  experiment  was  run  on 
a  VAX  11/780  computer  using  PA8CAL  programming  language.  A  set  of 
SO  carefully  selected  samples  from  each  class  is  used  as  training 
samples.  The  remaining  210  samples  are  test  samples.  The  weights 
for  substitution  errors  are  shown  in  Table  II.  The  results  shown 
in  Table  III  and  Table  IV  are  the  information  about  the  inferred 
greener  and  parsing.  The  grammars  are  inferred  using  K-tail  elgo- 
tithm  with  different  values  of  k.  Table  III  contains  the  number 
of  nonterminals,  the  number  of  productions  and  the  number  of 
negative  samples  accepted.  Table  IV  contains  average  parsing  time 
for  one  string  and  the  percentage  of  correct  classif ication.  It 
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can  be  scan  that  as  the  value  of  k  becomes  smaller,  the  parsing 
time  becomes  shorter  but  the  c lassif ication  error  becomes  larger. 
This  results  from  the  uneven  merge  of  the  nonterminals.  Due  to 
the  characteristics  of  our  sample  set  only  those  states  having 
the  longest  tails  are  merged.  The  results  using  nearest-neighbor 
decision  rule  are  shown  in  Table  V.  It  compares  the  string-to- 
string  distance  between  the  test  sample  and  the  whole  class  of 
training  samples.  The  computation  speed  of  nearest-neighbor  rule 
is  much  faster  than  that  of  error-correcting  parsers.  Althouth 
the  ultimate  performance  is  about  the  same.  As  far  as  practical 
computation  is  concerned,  nearest-neighbor  decision  rule  is  much 
faster  than  the  grammatical  approach. 

TABLE  II 

Weights  for  substition  error 
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TABLE  HI 

The  number  of  nonterminal.  production  and  negative  samples 
accepted  of  tho  inferred  grammar*.  The  infaranca  algorithm 
is  k-tail  algorithm  with  diffarant  valuas  of  k. 


k 

Eiplosion 

Earthquake 

No.  of 

Nonterm. 

No. 

Product. 

No. 

Nontarm. 

No. 

Product. 

No. 

negative 

samples 

accepted 

20 

748 

796 

746 

794 

0 

19 

748 

796 

746 

794 

0 

18 

741 

796 

737 

794 

0 

17 

722 

778 

715 

772 

0 

16 

694 

751 

686 

743 

0 

15 

656 

714 

650 

708 

0 

14 

410 

668 

608 

666 

0 

13 

<?  -f 

618 

561 

619 

0 

12 

510 

568 

311 

569 

0 

11 

460 

518 

461 

519 

0 

9 

360 

418 

361 

419 

0 

7 

262 

319 

261 

319 

2 

5 

166 

222 

164 

220 

6 

Though  tha  clastif ication  rasults  taem  satisfactory  they  are 
vary  sansitiva  to  tha  faatura  salaction>  the  salaction  of  train¬ 
ing  samples  and  the  weight  assignment  of  error  productions. 
Although  a  finite  sat  of  samples  have  soma  limitations.  It  still 
maVas  sense  to  pursut  more  studies  about  tha  following  problems. 

1.  Feature  salaction.  How  to  find  a  set  of  distinguishable 
features  is  the  most  important  part  in  practical  app 1 ications. 
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TABLE  IV 

The  average  parsing  time  and  percentage  of  correct 
class  if ication  of  tha  error-correcting  parsers  with 
different  values  of  k. 


■ 

Average  parsing  time 
for  one  string  (sec) 

Percentage  of  correct 
c  lassif  ication  ( 7. ) 

20 

2  6 

90.  5 

19 

2.  6 

90.  5 

IS 

2.  8 

85.  5 

17 

2.  7 

82.  8 

16 

2.  6 

75.  6 

15 

2.  5 

76.  0 

14 

2.  4 

73.  8 

13 

2.  1 

73.  3 

12 

1.  9 

72.  9 

11 

1.  7 

71.  0 

y 

1.  4 

70.  1 

H 

1.  1 

70.  6 

B 

0  8 

60.  2 

ulty 

increases  when  the  class 

are  somewhat  overlapped. 

Possible  solution  are  finding  sons  kind  of  transformation  which 
will  separate  the  classes  or  selecting  the  most  distinguishable 
feature.  Host  of  the  features  which  are  effective  for  statistical 
approach  can  be  used  for  syntactic  approach.  The  selection  of 
feature  number  also  deserves  consideration.  Some  criteria  are 
needed  so  that  a  judgement  can  be  made. 


2  Selection  of  training  samples.  It  would  be  helpful  if 
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TABLE  V 

Classification  rssults  using 
nearest-neighbor  dacision  ruls 


Average  time  for 
one  string  (sec) 

Percentage  of 
correct  c lass  if icat i on 

0.  07 

90.  5  */. 

200  records  are  correctly 
classified  out  of  221 

Hunan  experts  are  available  for  consultation.  The  clustering 
techniques  can  be  used  to  get  an  initial  training  set.  then  it 
can  be  adjusted  to  obtain  the  best  results.  Clustering  techniques 
can  also  be  used  to  find  good  prototypes  from  a  set  of  samples. 
A  small  set  of  well-selected  training  samples  will  certainly 
reduce  computation  time  and.  in  the  meantime,  may  improve  the 
classif ication  accuracy. 

3  Weight  assignment  of  error  productions.  This  part  is  very 
important  in  error-correcting  parsing,  and  only  exists  in  syntac¬ 
tic  approach.  Equal  weight  assignment  is  very  easy  to  implement 
and  has  been  used.  However,  it  is  not  always  appropriate  since 
costs  should  be  different  for  different  errors.  The  similarity 
between  two  primitives  is  a  good  reference  for  assigning  weights 
to  substitution  errors.  The  weights  of  insertion  and  deletion 
errors  are  more  difficult  to  assign.  Only  heuristic  approaches 
have  been  known  so  far. 
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